Development and Evaluation of an Italian Broadcast News Corpus

نویسندگان

  • Marcello Federico
  • Dimitri Giordani
  • Paolo Coletti
چکیده

This paper reports on the development and evaluation of an Italian broadcast news corpus at ITC-irst, under a contract with the European Language resources Distribution Agency (ELDA). The corpus consists of 30 hours of recordings transcribed and annotated with conventions similar to those adopted by the Linguistic Data Consortium for the DARPA HUB-4 corpora. The corpus will be completed and released to ELDA by April 2000.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advances in automatic transcription of Italian broadcast news

This paper presents some recent improvements in automatic transcription of Italian broadcast news obtained at ITCirst. A first preliminary activity was carried out in order to develop a suitable speech corpus for the Italian language. The resulting corpus, formed by recordings covering 30 hours of radio news, was exploited for developing a baseline system for transcription of broadcast news. Th...

متن کامل

A baseline for the transcription of Italian broadcast news

This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In particular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The baseline system consists of an audio segmentation module and a speech recognizer featuring a recu...

متن کامل

Speaker tracking in a broadcast news corpus

Speaker tracking is the process of following who says something in an audio stream. In the case the audio stream is a recording of broadcast news, speaker identity can be an important meta-data for building digital libraries. Moreover, the segmentation and classification of the audio stream in terms of acoustic contents, bandwidth and speaker gender allow to filter out portions of the signal wh...

متن کامل

TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation

This article presents an overview of the existing acoustical corpuses suitable for broadcast news automatic transcription task in the Slovak language. The TUKE-BNews-SK database created in our department was built to support the application development for automatic broadcast news processing and spontaneous speech recognition of the Slovak language. The audio corpus is composed of 479 Slovak TV...

متن کامل

Matbn 2002: a Mandarin Chinese Broadcast News Corpus

The MATBN 2002 Mandarin Chinese broadcast news corpus contains a total of 40 hours of broadcast news from Public Television Service Foundation (Taiwan) with corresponding transcripts. The primary motivation for this collection is to provide training and testing data for continuous speech recognition evaluation in the broadcast domain. We expect to collect and process 220 hours of Mandarin Chine...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000